NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning Contextualized Action Representations in Sequential Decision Making for Adversarial Malware Optimization

https://doi.org/10.1109/TDSC.2024.3477272

Ebrahimi, Reza; Pacheco, Jason; Hu, James; Chen, Hsinchun (May 2025, IEEE Transactions on Dependable and Secure Computing)

Full Text Available
Large Language Models for Conducting Advanced Text Analytics Information Systems Research

https://doi.org/10.1145/3682069

Ampel, Benjamin; Yang, Chi-Heng; Hu, James; Chen, Hsinchun (March 2025, ACM Transactions on Management Information Systems)

The exponential growth of digital content has generated massive textual datasets, necessitating the use of advanced analytical approaches. Large Language Models (LLMs) have emerged as tools that are capable of processing and extracting insights from massive unstructured textual datasets. However, how to leverage LLMs for text analytics Information Systems (IS) research is currently unclear. To assist the IS community in understanding how to operationalize LLMs, we propose a Text Analytics for Information Systems Research (TAISR) framework. Our proposed framework provides detailed recommendations grounded in IS and LLM literature on how to conduct meaningful text analytics IS research for design science, behavioral, and econometric streams. We conducted three business intelligence case studies using our TAISR framework to demonstrate its application in several IS research contexts. We also outline the potential challenges and limitations of adopting LLMs for IS. By offering a systematic approach and evidence of its utility, our TAISR framework contributes to future IS research streams looking to incorporate powerful LLMs for text analytics.
more » « less
Full Text Available
Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach

https://doi.org/10.1109/ICDM58522.2023.00019

Etter, Brian; Hu, James Lee; Ebrahimi, Mohammadreza; Li, Weifeng; Li, Xin; Chen, Hsinchun (December 2023, IEEE)

Adversarial Malware Generation (AMG), the generation of adversarial malware variants to strengthen Deep Learning (DL)-based malware detectors has emerged as a crucial tool in the development of proactive cyberdefense. However, the majority of extant works offer subtle perturbations or additions to executable files and do not explore full-file obfuscation. In this study, we show that an open-source encryption tool coupled with a Reinforcement Learning (RL) framework can successfully obfuscate malware to evade state-of-the-art malware detection engines and outperform techniques that use advanced modification methods. Our results show that the proposed method improves the evasion rate from 27%-49% compared to widely-used state-of-the-art reinforcement learning-based methods.
more » « less
Full Text Available
Binary Black-Box Attacks Against Static Malware Detectors with Reinforcement Learning in Discrete Action Spaces

https://doi.org/10.1109/SPW53761.2021.00021

Ebrahimi, Mohammadreza; Pacheco, Jason; Li, Weifeng; Hu, James Lee; Chen, Hsinchun (May 2021, 2021. IEEE S&P Workshop on Deep Learning and Security (DLS),)
null (Ed.)
Full Text Available
Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language Model

https://doi.org/2012.07994

Ebrahimi, Mohammadreza; Zhang, Ning; Hu, James; Raza, Muhammad Taqi; Chen, Hsinchun (January 2021, 2021, AAAI workshop on Robust, Secure and Efficient Machine Learning (RSEML))
null (Ed.)
Full Text Available
Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach

https://doi.org/10.1109/ICDMW51313.2020.00072

Lin, Fangyu; Liu, Yizhi; Ebrahimi, Mohammadreza; Ahmad-Post, Zara; Hu, James Lee; Xin, Jingyu; Samtani, Sagar; Li, Weifeng; Chen, Hsinchun (November 2020, International Conference on Data Mining Workshops (ICDMW))
null (Ed.)
The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals’ PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.
more » « less
Full Text Available
Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web

https://doi.org/10.1109/ISI49825.2020.9280540

Liu, Yizhi; Lin, Fang Yu; Ahmad-Post, Zara; Ebrahimi, Mohammadreza; Zhang, Ning; Hu, James Lee; Xin, Jingyu; Li, Weifeng; Chen, Hsinchun (November 2020, IEEE International Conference on Intelligence and Security Informatics (IEEE ISI 2020).)
null (Ed.)
Full Text Available
Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO)

https://doi.org/10.1371/journal.pcbi.1009463

Ramsey, Jolene; McIntosh, Brenley; Renfro, Daniel; Aleksander, Suzanne A.; LaBonte, Sandra; Ross, Curtis; Zweifel, Adrienne E.; Liles, Nathan; Farrar, Shabnam; Gill, Jason J.; et al (October 2021, PLOS Computational Biology)
Ouellette, Francis (Ed.)
Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.
more » « less
Full Text Available

Search for: All records